Query Hardness Estimation Using Jensen-Shannon Divergence Among Multiple Scoring Functions

نویسندگان

  • Javed A. Aslam
  • Virgil Pavlu
چکیده

We consider the issue of query performance, and we propose a novel method for automatically predicting the difficulty of a query. Unlike a number of existing techniques which are based on examining the ranked lists returned in response to perturbed versions of the query with respect to the given collection or perturbed versions of the collection with respect to the given query, our technique is based on examining the ranked lists returned by multiple scoring functions (retrieval engines) with respect to the given query and collection. In essence, we propose that the results returned by multiple retrieval engines will be relatively similar for “easy” queries but more diverse for “difficult” queries. By appropriately employing Jensen-Shannon divergence to measure the “diversity” of the returned results, we demonstrate a methodology for predicting query difficulty whose performance exceeds existing state-ofthe-art techniques on TREC collections, often remarkably so.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discrimination Measure of Correlations in a Population of Neurons by Using the Jensen-Shannon Divergence

The significance of synchronized spikes fired by nearby neurons for perception is still unclear. To evaluate how reliably one can decide if a given response on the population coding of sensory information comes from the full distribution, or from the product of independent distributions from each cell, we used recorded responses of pairs of single neurons in primary visual cortex of macaque mon...

متن کامل

Active Learning for Probability Estimation Using Jensen-Shannon Divergence

Active selection of good training examples is an important approach to reducing data-collection costs in machine learning; however, most existing methods focus on maximizing classification accuracy. In many applications, such as those with unequal misclassification costs, producing good class probability estimates (CPEs) is more important than optimizing classification accuracy. We introduce no...

متن کامل

A Note on Bound for Jensen-Shannon Divergence by Jeffreys

We present a lower bound on the Jensen-Shannon divergence by the Jeffrers’ divergence when pi ≥ qi is satisfied. In the original Lin's paper [IEEE Trans. Info. Theory, 37, 145 (1991)], where the divergence was introduced, the upper bound in terms of the Jeffreys was the quarter of it. In view of a recent shaper one reported by Crooks, we present a discussion on upper bounds by transcendental fu...

متن کامل

Alpha-Divergence for Classification, Indexing and Retrieval (Revised 2)

Motivated by Chernoff’s bound on asymptotic probability of error we propose the alpha-divergence measure and a surrogate, the alpha-Jensen difference, for feature classification, indexing and retrieval in image and other databases. The alpha-divergence, also known as Renyi divergence, is a generalization of the Kullback-Liebler divergence and the Hellinger affinity between the probability densi...

متن کامل

Bounds on Non-Symmetric Divergence Measures in Terms of Symmetric Divergence Measures

There are many information and divergence measures exist in the literature on information theory and statistics. The most famous among them are Kullback-Leibler [13] relative information and Jeffreys [12] Jdivergence. Sibson [17] Jensen-Shannon divergence has also found its applications in the literature. The author [20] studied a new divergence measures based on arithmetic and geometric means....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007